Chapter 9: Strings

Notes

Need more information on ASCII.
Need the name/number of the ANSI table that is identical to ASCII for the first 127 characters.
Possible exercise: switch every two characters in a string: ΓÇ£Hello WorldΓÇ¥ = ΓÇ£ehll ooWlrdΓÇ¥ or something.
Possible exercise: write a function to determine if a string contains nothing (is NULL, empty, or blank).

Text is Character Strings

IΓÇÖve been preaching numbers and programming for so long you might have forgotten text.┬á Alphabetic characters, punctuation, etc.┬á The truth is that text is represented by strings in computer memory and logic.┬á A string is a list of values.┬á The term string can be applied to any list of values and a lot more in our real world.┬á If you have beads on a necklace, you may have a string of beads; people standing in line are a string of people, etc.┬á But IΓÇÖm going to narrow things down a bit.┬á The term string is universally seen as an abbreviation of character string, which is a string of characters.┬á From now on when I use the term ΓÇ£stringΓÇ¥ without context, I am referring to character strings.

All the time youΓÇÖve been reading written language, youΓÇÖve been seeing character strings.┬á A character string is a string of characters.┬á Each value in the string is a character; that is each value is not just each letter, but each space, punctuation mark, etc.┬á Typically strings are represented by enclosing them in double quotes:

ΓÇ£Hello WorldΓÇ¥

The string above consists of the characters ΓÇÿHΓÇÖ, ΓÇÿeΓÇÖ, ΓÇÿlΓÇÖ, ΓÇÿlΓÇÖ, ΓÇÿoΓÇÖ, ΓÇÿ ΓÇÿ (space), ΓÇÿWΓÇÖ, ΓÇÿoΓÇÖ, ΓÇÿrΓÇÖ, ΓÇÿlΓÇÖ, and ΓÇÿdΓÇÖ.┬á The double quotes enclosing the string are not part of the string itself.┬á YouΓÇÖve probably seen this particular string before, and youΓÇÖve been using it and others without realizing, perhaps, that it is a string.

In memory a string is still an array, regardless if it is a string of characters or integers.┬á Each character is actually an index into a character table.┬á This embraces the fact that everything breaks down to a number in computer programming.┬á The character table is also known as a code page or character set.

Code pages can differ from system to system.┬á However, most systems have code pages that have 256 different indices (0 to 255) and therefore each character in a string will occupy a single byte[1] in memory.┬á You can represent numbers via named values or literals; the same is true with characters.┬á A character literal is just that.┬á However, a character literal actually represents a numeric index into the current code page: so what you see is not necessarily what you get J.

Character Literals

To represent a single text character you must surround it in apostrophes, also known as single quotes.┬á To use the lower-case character ΓÇ£aΓÇ¥[2] we would write it as such:

ΓÇÿaΓÇÖ

This literal actually equates to a numerical index in the current code page.┬á The most common code page is either ANSI <number> or ASCII.┬á ANSI <number> and ASCII are the same for the first 128 indices (0 to 127).┬á A character literal, like ΓÇÿaΓÇÖ above, can be used any place you would normally use a numeric literal.┬á You could assign it to a variable, use it in a computational expression, or use it with ΓÇÿcoutΓÇÖ:

int x = ΓÇÿaΓÇÖ; ┬á┬á┬á┬á┬á┬á┬á┬á// assign index value to ΓÇÿxΓÇÖ

int y = ΓÇÿaΓÇÖ + 10;┬á┬á┬á // use in computational expression

cout << ΓÇÿaΓÇÖ << endl; // output with ΓÇÿcoutΓÇÖ

A character literal is interpreted by the compiler as its numerical index.┬á Thus it does not actually put ΓÇÿaΓÇÖ into ΓÇÿxΓÇÖ above, it puts the index of ΓÇÿaΓÇÖ from the current character table into ΓÇÿxΓÇÖ.┬á Unless your computer is in some parallel universe, if you are using a PC then the letter ΓÇÿaΓÇÖ is actually the number ninety-six (96).┬á Therefore the variable ΓÇÿxΓÇÖ above will contain ninety-six (96) and ΓÇÿyΓÇÖ will contain the value one hundred six (106) after the code is executed.

ASCII and Assumptions

IΓÇÖm going to assume that you are using an ASCII system for any of this to make real sense.┬á That is, a platform (such as operating system and underlying BIOS) and compatible compiler that use the ASCII table for determining which indices correspond to which characters.

I use the term characters because not all of them are letters or numbers.┬á Each individual that you can see in plain text is a single character.┬á This includes spaces, punctuation, and even special characters such as new-lines and tabs.┬á Lower and upper-case letters are even individual characters.┬á A character is just an individual thing, thus a lower-case character is just considered another thing like its conceptually upper-case counterpart.┬á To the computer processor, it sees them as different sets of bits whereas we see them as different forms of the same letter.┬á It is probably for this reason that many languages are case sensitive, because its easier to program it as such and its inevitably faster.

AuthorΓÇÖs Opinion: Case sensitivity is ridiculous.┬á I think variables should be case sensitive only when it concerns the same variable.┬á I donΓÇÖt think two variables of the same name with different case should be allowed.┬á At the very least compilers should suggest possible names that you mis-cased.┬á For example if you write SomeVar instead of someVar, it should say: ΓÇ£Perhaps you meant ΓÇÿsomeVarΓÇÖ you knuckle-head!ΓÇ¥

My examples will make more sense to you if you are using ASCII because we will have the same results.┬á The letter ΓÇ£AΓÇ¥ has an index of 65 in ASCII and if it does on your machine as well then weΓÇÖre working on the same ground.┬á Try this out in a small program:

cout << ΓÇ£The index of ΓÇÿAΓÇÖ is ΓÇ£ << (int)ΓÇÖAΓÇÖ << endl;

On my system this results in the following output:

The index of ΓÇÿAΓÇÖ is 65

If you have the same then joyous day, youΓÇÖre working with an ASCII-compatible system.┬á ANSI <number> is identical to this character set for the first 128 (indices 0 to 127) characters.┬á Luckily for us, thatΓÇÖs where all the important ones are.┬á Some systems actually use only 7-bits to store individual characters, or there are formats which expect only 7-bits to be used.┬á This allows for 128 different characters and thatΓÇÖs no coincidence.

Char

A character literal has the same sign as the ΓÇÿcharΓÇÖ data type.┬á The name ΓÇÿcharΓÇÖ gives that way almost immediately.┬á Usually character literals are used in conjunction with ΓÇÿcharΓÇÖ variables:

char c = ΓÇÿaΓÇÖ;

cout << c << endl;

When you output a ΓÇÿcharΓÇÖ variable with ΓÇÿcoutΓÇÖ, it will output a character rather than the number stored.┬á That is to say that ΓÇÿcoutΓÇÖ knows that ΓÇÿcharΓÇÖ variables represent indices into the current code page and will output a character from the current one, using the ΓÇÿcharΓÇÖ variableΓÇÖs value as an index.

DonΓÇÖt forget that a ΓÇÿcharΓÇÖ is still an integer-based variable and can store a number.┬á You can use it in computations or pointer arithmetic.┬á An example:

char c = ΓÇÿaΓÇÖ;

c++;

cout << c << endl;

The above will put the index of the character ΓÇÿaΓÇÖ into the ΓÇÿcharΓÇÖ variable ΓÇÿcΓÇÖ.┬á Next ΓÇÿcΓÇÖ is incremented.┬á Lastly, the character represented by ΓÇÿcΓÇÖ is printed.┬á The character printed will not be ΓÇÿaΓÇÖ, it will be the character at the next index in the code page.┬á Typically letters are stored sequentially in the code page and the output of this will probably be ΓÇÿbΓÇÖ.

Using ΓÇÿcharΓÇÖ with ΓÇÿcinΓÇÖ, a user can enter an actual character rather than a number.┬á For example:

char c;

cin >> c;

This would expect the user to input a single character, not a number.┬á The ΓÇÿcinΓÇÖ functionality recognizes the ΓÇÿcharΓÇÖ data type and therefore expects different input.┬á Many standard functions utilize the ΓÇÿcharΓÇÖ data type much differently from the other numeric variables for the sole reason that it typically represents a single text character as well as a number.

If you get nothing else from this chapter, nay this entire book let it be that you understand that characters are stored as numbers.┬á Remember I speak in characters not letters.┬á Each numerical digit is actually a character in the code page as well:

char c = ΓÇÿ0ΓÇÖ;

In the above, the character zero is stored into the ΓÇÿcharΓÇÖ variable ΓÇÿcΓÇÖ.┬á Remember the compiler interprets characters by their index into the code page.┬á The index of the character ΓÇÿ0ΓÇÖ (zero) in ASCII is 48.┬á Thus, the compiler sees the following which would be processed identically should you write it on an ASCII system:

char c = 48;

The character ΓÇ£0ΓÇ¥ (zero) is a representation of the index 48.┬á This is different from the numerical value 0 (zero):

char c = 0;

That is putting the actual value zero (0) into the ΓÇÿcharΓÇÖ variable ΓÇÿcΓÇÖ.┬á At index zero (0), or the first index, in the ASCII character table we find the NULL character (as explained later).┬á Obviously the NULL character and the character ΓÇ£0ΓÇ¥ (zero) are not the same.┬á So, please tell me that you can deduce the result of the following condition:

if (ΓÇÿ0ΓÇÖ == 0)

If this is boggling your mind, please feel free to take a second, step away from this text, and just ponder it.┬á What we take for granted as text in the computer is merely a list of numbers and below that its only zeroes and ones so be glad IΓÇÖm dealing with you at this high level! J┬á A computer only understands numbers so things like letters and punctuation must be categorized as characters and each assigned an individual ΓÇ£idΓÇ¥ or index.┬á What characters are categorized and at what index is dependent on the character set (code page).┬á In ASCII, there are 256 characters categorized and each given an index of, obviously, zero (0) to two hundred fifty-five (255).

We are not unlike computers.┬á You cannot write something that completely explains a single person, so you use their name.┬á My name is Neil and it is used in writing because there is no letter in any alphabet to sufficiently and completely describe my entire person.┬á Computers are the same way with characters.┬á Just as we cannot use a single letter for each thing in our environment, including ourselves, a computer can only use numbers to describe things.┬á Rather than mentally write in the character itself into its brain, it inscribes its number.

To be done: example program inputting characters.

Character Functions

Although it is nice to all be using the ASCII character set, it is far from being a world-wide custom.┬á In ASCII you know which indices represent which characters simply by looking them up yourself.┬á You could tell if a user entered a numerical digit into a character, by comparing it to one of the ten indices in ASCII that represent digits.┬á Or, you could use the isdigit() function which does the same thing, but will work on whatever character set you are compiling for.

This function, and its compatriots, can only be used after including <ctype.h> at the top of your program.┬á IΓÇÖm pretty sure the name of this header file is an abbreviation for ΓÇ£Character TypeΓÇ¥, alluding to functions dealing with the type of characters.

┬á

The isdigit() function accepts a single character parameter and returns either a one (1) or a zero (0) depending on if the character passed in is a digit.┬á The syntax of isdigit() describes it as accepting an ΓÇÿintΓÇÖ parameter which sounds unfamiliar where characters are concerned.┬á It is perfectly acceptable, however, to pass in a ΓÇÿcharΓÇÖ variable or character literal:

char c = ΓÇÿaΓÇÖ;

if (isdigit(ΓÇÿ0ΓÇÖ))

{

┬á┬á┬á // this will be executed

}

if (isdigit(c))

{

┬á┬á┬á // this will not

}

All of the character determination functions are like isdigit() in that they accept an ΓÇÿintΓÇÖ parameter pertaining to the character to check and returns an ΓÇÿintΓÇÖ value of either one (1) (for true) or zero (0) (for false).┬á Since these functions all act like isdigit(), I have provided a list of them along with brief descriptions below.

isalnum	a letter or a digit
isalpha	a letter
iscntrl	any control character
isdigit	a digit
isgraph	any printing character except for the space character
islower	a lowercase letter
isprint	any printing character
ispunct	any punctuation character
isspace	a white space character (space, tab, new line, etc.)
isupper	an uppercase letter
isxdigit	a hexadecimal digit

Because lower and upper-case letters are seen as totally separate characters, there are functions to switch between them.┬á These are toupper() and tolower() and like the ΓÇ£isΓÇ¥ functions above, <ctype.h> must be included in order to use them.

Both of these letter conversion functions accept an ΓÇÿintΓÇÖ parameter corresponding to the character to convert and return an ΓÇÿintΓÇÖ representing the converted character:

char chi = toupper(ΓÇÿaΓÇÖ);

char clo = tolower(chi);

After the above code is executed the ΓÇÿcharΓÇÖ variable ΓÇÿchiΓÇÖ will contain the character ΓÇÿAΓÇÖ (or rather the index representing ΓÇÿAΓÇÖ remember!) and ΓÇÿcloΓÇÖ will contain the character ΓÇÿaΓÇÖ.

String Literals and Constants

A string literal is a representation of a constant array of characters.┬á That is, the string can be used in the same places as a value of type ΓÇÿconst char []ΓÇÖ.┬á To create a string literal, simply enclose text within double quotation marks, also known as double quotes:

ΓÇ£string literalΓÇ¥

As mentioned before, everything within the double quotes is part of the string literal, not the quotes themselves.┬á In addition to the characters youΓÇÖve typed, an extra one is automatically added for you: the NULL character.┬á Yes, if youΓÇÖre familiar with pointers then the term should come as no surprise.┬á A NULL character is one with a numeric index of zero.┬á Therefore the literal above contains the characters ΓÇÿsΓÇÖ, ΓÇÿtΓÇÖ, ΓÇÿrΓÇÖ, ΓÇÿiΓÇÖ, ΓÇÿnΓÇÖ, ΓÇÿgΓÇÖ, ΓÇÿ ΓÇÿ (space), ΓÇÿlΓÇÖ, ΓÇÿiΓÇÖ, ΓÇÿtΓÇÖ, ΓÇÿeΓÇÖ, ΓÇÿrΓÇÖ, ΓÇÿaΓÇÖ, ΓÇÿlΓÇÖ, and 0 (null).┬á It also means that the following is equivalent:

const char mystr[15] = { ΓÇÿsΓÇÖ, ΓÇÿtΓÇÖ, ΓÇÿrΓÇÖ, ΓÇÿiΓÇÖ, ΓÇÿnΓÇÖ, ΓÇÿgΓÇÖ, ΓÇÿ ΓÇÿ,

┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á ΓÇÿlΓÇÖ, ΓÇÿiΓÇÖ, ΓÇÿtΓÇÖ, ΓÇÿeΓÇÖ, ΓÇÿrΓÇÖ, ΓÇÿaΓÇÖ, ΓÇÿlΓÇÖ, 0 };

Notice that the size of the array includes the NULL character.┬á The NULL character is part of the string, it is just unseen.┬á The NULL character is used to know when a string is ended.┬á When you output a string with ΓÇÿcoutΓÇÖ, it loops through all of the subscripts in the string (character array) and outputs each character.┬á It ends the loop when it reaches a NULL character.┬á The NULL character is also known as the termination character because it terminates the visible portion of the string.

A character string is a character array and elements may exist in it past the NULL character.┬á Those elements, however, would not be printed when using ΓÇÿcoutΓÇÖ because it stops at the first NULL character.┬á If a NULL character was not use to terminate the string, it would not know when to stop outputting characters and would most likely go past the end of the string into no-strings memory J.

Think of a character string as beads on a string.┬á The beads will simply fall off the end of the string unless there is a knot at the end to keep them in.┬á A blind man can determine where the last bead is by the knot that follows it.┬á Likewise, more beads may exist beyond the knot, but where they end cannot be determined unless a knot follows them as well.

Variable Strings

A string literal is a constant array of characters.┬á To create a string that can be changed, you simply create a string variable which is actually just an array of ΓÇÿcharΓÇÖ subscripts.┬á A ΓÇÿcharΓÇÖ array can be used by itself with ΓÇÿcoutΓÇÖ just like a string literal or constant:

char mystr[] = { ΓÇÿHΓÇÖ, ΓÇÿeΓÇÖ, ΓÇÿlΓÇÖ, ΓÇÿlΓÇÖ, ΓÇÿoΓÇÖ, ΓÇÿ ΓÇÿ,

┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á ┬áΓÇÿWΓÇÖ, ΓÇÿoΓÇÖ, ΓÇÿrΓÇÖ, ΓÇÿlΓÇÖ, ΓÇÿdΓÇÖ, 0 };

cout << mystr << endl;

The output from the above would be:

Hello World

ThatΓÇÖs simple enough.┬á As you can see, text is merely several characters that are represented (deep down) by numbers.┬á Luckily, we get to work with character and string literals rather than the numbers themselves.

Once a string is created, it can obviously be modified.┬á Since it is still an array like anything else, you can modify the subscripts individually yourself or call standard helper functions which do common things for you.┬á The latter is much easier, but teaches you less about what is really going on.┬á Thus, helper functions will not be covered until much later in this chapter.

With a variable string the characters that make it up can be changed.┬á For instance, perhaps we want to change the string to ΓÇ£Goodbye WorldΓÇ¥.┬á This sounds like a very simple thing.┬á On paper you would simply erase ΓÇ£HelloΓÇ¥ and scribble in ΓÇ£GoodbyeΓÇ¥.┬á But notice that ΓÇ£GoodbyeΓÇ¥ occupies more characters than ΓÇ£HelloΓÇ¥.┬á Therefore to change ΓÇ£HelloΓÇ¥ to ΓÇ£GoodbyeΓÇ¥ weΓÇÖll need two (2) extra characters, fourteen (14) in total to store ΓÇ£Goodbye WorldΓÇ¥.

Firstly weΓÇÖll need to make sure that the string has enough space to store ΓÇ£Goodbye WorldΓÇ¥ which means creating it with fourteen (14) subscripts.┬á This is enough for the thirteen (13) visible characters as well as the terminating NULL character:

char mystr[14];

WeΓÇÖll start by initializing it to ΓÇ£Hello WorldΓÇ¥ like before:

Now, if the characters of ΓÇ£GoodbyeΓÇ¥ replace the first characters of ΓÇÿmystrΓÇÖ, the goal has not yet been achieved:

mystr[0] = ΓÇÿGΓÇÖ;

mystr[1] = ΓÇÿoΓÇÖ;

mystr[2] = ΓÇÿoΓÇÖ;

mystr[3] = ΓÇÿdΓÇÖ;

mystr[4] = ΓÇÿbΓÇÖ;

mystr[5] = ΓÇÿyΓÇÖ;

mystr[6] = ΓÇÿeΓÇÖ;

The string now contains ΓÇ£GoodbyeorldΓÇ¥, a far cry from ΓÇ£Goodbye WorldΓÇ¥.┬á Filling in the subscripts does automagically move over the previously existing ΓÇ£WorldΓÇ¥.┬á Unfortunately in this situation your two options both involve setting all the subscripts to store ΓÇ£WorldΓÇ¥.┬á One method would be to simply set all the subscripts for the entire ΓÇ£Goodbye WorldΓÇ¥ string.┬á The second would be to shift the subscripts storing the ΓÇ£ WorldΓÇ¥ characters (including the space) over by two (2):

mystr[12] = mystr[10]; // move the ΓÇÿdΓÇÖ character

mystr[11] = mystr[9]; // move the ΓÇÿlΓÇÖ character

mystr[10] = mystr[8]; // move the ΓÇÿrΓÇÖ character

mystr[9] = mystr[7]; // move the ΓÇÿoΓÇÖ character

mystr[8] = mystr[6]; // move the ΓÇÿWΓÇÖ character

mystr[7] = mystr[5]; // move the ΓÇÿ ΓÇÿ character

Notice anything missing?┬á The NULL character needs to be moved as well, so the following statement would precede the above code:

mystr[13] = mystr[11]; // move the NULL character

There is a logical reason for performing the shifting backwards.┬á That is, I set the subscript values in reverse order from last to first, thirteen (13) to seven (7).┬á If it was done the other way around, the string would not get shifted, it would get garbled.┬á Consider:

mystr[7] = mystr[5]; // move the ΓÇÿ ΓÇÿ character

mystr[8] = mystr[6]; // move the ΓÇÿWΓÇÖ character

mystr[9] = mystr[7]; // move the ΓÇÿoΓÇÖ character

mystr[10] = mystr[8]; // move the ΓÇÿrΓÇÖ character

mystr[11] = mystr[9]; // move the ΓÇÿlΓÇÖ character

mystr[12] = mystr[10]; // move the ΓÇÿdΓÇÖ character

mystr[13] = mystr[11]; // move the NULL character

The above code looks like it would shift everything to the right properly, doesnΓÇÖt it?┬á It wonΓÇÖt though, just look at subscript seven (7).┬á First it is set to the value of subscript five (5) and that completely overwrites whatever previous value was there.┬á Next subscript seven (7) is used to set subscript nine (9).┬á Rather than shifting, both subscript seven (7) and nine (9) have now become the same value: a blank space.┬á In fact, because weΓÇÖre shifting characters by two indices, the resulting string would only have two values (ΓÇÿ ΓÇÿ and ΓÇÿwΓÇÖ) repeated over and over, even over the NULL character!┬á This would be a run-time logic error rather than a syntax error.┬á The syntax is all correct, but the end result of the logic is not what was intended.

Now, setting subscripts individually is one way of moving the ΓÇ£WorldΓÇ¥ characters over by two, but then that wouldnΓÇÖt have been truly shifting the characters.┬á This is a tedious amount of code.┬á Knowing how to use loops effectively with arrays, and therefore strings, lessens the code.┬á The following code would shift the ΓÇ£WorldΓÇ¥ characters as well:

int i;

for (i = 13; i > 6; i--)

┬á┬á┬á mystr[i] = mystr[i - 2];

Using loops effectively like this is a bit less apparent and uses an extra variable (i), but is a huge timesaver and the concept can be applied to arbitrary strings.┬á Using loops with pointers to strings is even faster as youΓÇÖll see later.┬á Again youΓÇÖll notice that the loop works in reverse from the last character to be set, to the first.

Initializing Strings

Rather than put characters into a ΓÇÿcharΓÇÖ array one at a time, there is an easier method.┬á If you use a constant string (literal or named-value) as an initializer to a character array, the entire contents will be used to initialize the array.┬á All of the following are valid ways of initializing strings using constants or literals:

const char hw[] = { ΓÇÿHΓÇÖ, ΓÇÿeΓÇÖ, ΓÇÿlΓÇÖ, ΓÇÿlΓÇÖ, ΓÇÿoΓÇÖ, ΓÇÿ ΓÇÿ,

┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á┬á ΓÇÿWΓÇÖ, ΓÇÿoΓÇÖ, ΓÇÿrΓÇÖ, ΓÇÿlΓÇÖ, ΓÇÿdΓÇÖ, 0 };

char mystr1[12] = ΓÇ£Hello WorldΓÇ¥;

char mystr2[] = ΓÇ£Hello WorldΓÇ¥;

char mystr3[] = hw;

These unique assignments cannot happen outside of initialization; or at least will not yield the expected results.┬á It is much like how the initializing of an array or structure cannot be done the same once they are already created.

Pointers to Strings

Pointers can be used to represent strings by pointing to the arrays that hold the characters.┬á The pointer will act identical to the array and also has some additional functionality.┬á Recall that you can use arithmetic on pointers to move through memory quickly and directly.┬á Obviously string pointers can also be dangerous and their purpose is more difficult to detect.┬á A pointer can be NULL as well as point to things other than valid memory or even an array.┬á Consider the following:

char mystr[] = ΓÇ£Hello WorldΓÇ¥;

char *p = mystr;

The variable ΓÇÿpΓÇÖ now points to the string ΓÇ£Hello WorldΓÇ¥ which can be accessed by ΓÇÿmystrΓÇÖ.┬á A character pointer can be used in the same way a character array is used: for input with ΓÇÿcinΓÇÖ, for output ΓÇÿcoutΓÇÖ, and for direct modification of each subscript in the array:

cout << p << endl;

p[0] = ΓÇÿhΓÇÖ;

cout << p << endl;

The above code placed after the earlier sample would yield the output:

Hello World

hello World

But what if ΓÇÿpΓÇÖ was actually a pointer to a single ΓÇÿcharΓÇÖ variable rather than an entire string?

char c;

p = &c;

Using ΓÇÿpΓÇÖ, without dereferencing it, in ΓÇÿcoutΓÇÖ or ΓÇÿcinΓÇÖ would assume that ΓÇÿpΓÇÖ points to an array of characters rather than a single character.┬á The danger is very clear, but can easily be avoided.┬á It is also possible to declare a pointer to a literal:

char *p = ΓÇ£Hello WorldΓÇ¥;

The danger is that the literal should not be modified.┬á It is certainly possible, but the possibility also exists that the data exists in read-only memory and modifying it may cause errors.

Cin and Strings

There are several ways to get user input to strings with ΓÇÿcinΓÇÖ.┬á The first is the traditional way by using ΓÇÿcinΓÇÖ with the insertion operator ΓÇÿ>>ΓÇÖ:

char str[20];

cin >> str;

The immediate issues with this are it only puts the first word into the string, and thereΓÇÖs no way to limit the number of characters that are inserted.┬á So if you typed in ΓÇ£Hello WorldΓÇ¥, only ΓÇ£HelloΓÇ¥ would be stored in ΓÇÿstrΓÇÖ.┬á If you typed ΓÇ£SupercalifragilisticexpialidociousΓÇ¥ it would try to put all of the characters into ΓÇÿstrΓÇÖ.┬á Obviously something will go wrong because ΓÇÿstrΓÇÖ can only hold twenty (20) total characters (nineteen (19) visible characters and one (1) null character).

There are some functions part of ΓÇÿcinΓÇÖ that can be used to get user input to strings.┬á Unfortunately I have to ask you to take a blind leap similar to that of initially breaking into the circle (Chapter 2), because IΓÇÖm going to introduce some functions that are members of ΓÇÿcinΓÇÖ which is an object.┬á I will not cover objects or member functions until later on; please just accept the fact that what IΓÇÖm about to show you works J.┬á The first is ΓÇÿcin.getline()ΓÇÖ which is a function that accepts a string (through a pointer) followed by an integer[3]:

cin.getline(char *dest, int maxchars);

The first parameter is the destination for the input (e.g. the character string to put the input into).┬á The second parameter is the maximum number of characters that can be put into the destination string.┬á With the second parameter you can make sure that only the stringΓÇÖs memory is touched, nothing past.┬á For example, to input characters into ΓÇÿstrΓÇÖ:

cin.getline(str, 20);

An advantage with this object function (ΓÇÿgetline()ΓÇÖ is a function part of the ΓÇÿcinΓÇÖ object) is that you can type in any characters in and they will all go into the string.┬á This includes spaces, commas, etc.┬á If you type in ΓÇ£Hello WorldΓÇ¥ then the whole thing will be put into the destination (ΓÇÿstrΓÇÖ).┬á If you type in ΓÇ£SupercalifragilisticexpialidociousΓÇ¥ then only ΓÇ£SupercalifragilistiΓÇ¥ (the first nineteen (19) characters) will be put into ΓÇÿstrΓÇÖ.┬á The remaining characters are simply dropped.┬á Only nineteen (19) characters are copied because the remaining one is reserved for the NULL character (without it the string would be a string of beads with no knot at the end to keep them together J).

There is a problem with ΓÇÿcin.getlineΓÇÖ when built on some compilers (to be done: insert compiler names with this problem).┬á When you put two calls to ΓÇÿcin.getline()ΓÇÖ back to back, the second is totally skipped over and a blank string is inserted into the secondΓÇÖs destination:

char str1[20], str2[20];

cout << ΓÇ£Input first name: ΓÇ£;

cin.getline(str1, 20);

cout << ΓÇ£Input last name: ΓÇ£;

cin.getline(str2, 20);

If you put this into a program, and youΓÇÖre not able to input a last name then your compiler has this problem as well.┬á To fix it youΓÇÖll need to be the following just after each call to ΓÇÿcin.getline()ΓÇÖ[4]:

while (cin.rdbuf()->in_avail())

┬á┬á┬á cin.rdbuf()->sbumpc();

This hideous code clears out the input buffer.[5]┬á My suggestion to prevent having to write this a lot is to make a ΓÇÿgetline()ΓÇÖ function yourself that automatically clears the buffer after calling ΓÇÿcin.getline()ΓÇÖ:

void getline(char *pstr, int max)

{

┬á┬á┬á cin.getline(pstr, max);

┬á┬á┬á while (cin.rdbuf()->in_avail())

┬á┬á┬á┬á┬á┬á┬á cin.rdbuf()->sbumpc();

}

Then to input a string, you would simply call:

getline(str, 20);

Notice the omitting of ΓÇÿcin.ΓÇÖ because the function called is the one weΓÇÖve written, not the one belonging to ΓÇÿcinΓÇÖ.┬á There are other functions for the input of strings, but you will find that ΓÇÿcin.getline()ΓÇÖ is probably the most useful at this point so I will skip the rest.

The following program makes an example of string input using ΓÇÿcin.getline()ΓÇÖ, clearing the input buffer (to prevent an endless ΓÇ£blankΓÇ¥ input), and string output with ΓÇÿcoutΓÇÖ:

01	#include <iostream.h>
02
03	void getline(char *, int);
04
05	int main()
06	{
07	┬á┬á┬á char first[32], last[32];
08	┬á┬á┬á cout << ΓÇ£Enter first name: ΓÇ£;
09	┬á┬á┬á getline(first, 32);
10	┬á┬á┬á cout << ΓÇ£Enter last name: ΓÇ£;
11	┬á┬á┬á getline(last, 32);
12	┬á┬á┬á cout << ΓÇ£You are ΓÇ£ << first << ΓÇ£ ΓÇ£ << last << endl;
13	┬á┬á┬á cout << ΓÇ£(ΓÇ£ << last << ΓÇ£,ΓÇ¥ << first << ΓÇ£)ΓÇ¥ << endl;
14	┬á┬á┬á return 0;
16	}
17
18	void getline(char *pstr, int max)
19	{
20	┬á┬á┬á cin.getline(pstr, max);
21	┬á┬á┬á while (cin.rdbuf()->in_avail())
22	┬á┬á┬á┬á┬á┬á┬á cin.rdbuf()->sbumpc();
23	}

Example output of this program is:

Enter first name: Neil

Enter last name: Obremski

You are Neil Obremski

(Obremski, Neil)

The output obviously depends on what you enter.┬á This program should work fine even on compilers with a bad implementation of ΓÇÿcin.getline()ΓÇÖ because we clear the input buffer after each call to it (inside the custom function ΓÇÿgetline()ΓÇÖ).

String Assignment

Assignment operations do not work to copy strings.┬á You can only assign a value to a string when it is initialized.┬á The following will not work:

char str[24] = ΓÇ£Hello WorldΓÇ¥;

str = ΓÇ£Goodbye WorldΓÇ¥;

And remember the following will not work as you expect:

char str[24] = ΓÇ£Hello WorldΓÇ¥;

char *p = str;

p = ΓÇ£Goodbye WorldΓÇ¥;

What is ΓÇÿpΓÇÖ pointing to after the above code is executed?┬á It is pointing to the memory containing the literal string ΓÇ£Goodbye WorldΓÇ¥, it is not pointing to ΓÇÿstrΓÇÖ any longer.┬á When you use an assignment operation with a pointer, it changes the pointer value.┬á A pointerΓÇÖs value is a memory address.┬á Thus, when you assign a literal to a pointer, you are simply setting the pointerΓÇÖs value, or memory address, to the address of the literal data.┬á You are not affecting whatever variable the pointer may have been pointing to before.┬á The following causes no problems:

char *p = 0;

p = ΓÇ£Hello WorldΓÇ¥;

The assignment operation puts a new memory address in ΓÇÿpΓÇÖ, it doesnΓÇÖt affect what ΓÇÿpΓÇÖ was pointing to previous to the assignment.┬á The only way to set the value of a string is to set each individual character.┬á There are several ways to do this.┬á YouΓÇÖve seen this previously using a loop and by accessing each subscript via an index.┬á Now IΓÇÖll explain how to do the same thing with a pointer.

Conceptually, character pointer contains the memory address to a single ΓÇÿcharΓÇÖ variable.┬á It can also contain the memory address of the first character in a string of characters.┬á Either way it contains the memory address of a single character.┬á If you remember back to pointer arithmetic, you can move the pointer linearly through memory using arithmetic assignment operations:

char *p = ΓÇ£Hello WorldΓÇ¥;

p++;

After this code is executed, ΓÇÿpΓÇÖ will be pointing to the ΓÇÿeΓÇÖ character in the literal string ΓÇ£Hello WorldΓÇ¥.┬á Nothing has happened to the character ΓÇ£Hello WorldΓÇ¥ in memory, the pointer ΓÇÿpΓÇÖ is simply pointing to the second byte in memory from the start of the ΓÇ£Hello WorldΓÇ¥ memory.┬á This means you can ghost through memory using pointers without affecting it in anyway[6].

Dereferencing a character pointer will yield a single character value:

char *p = ΓÇ£Hello WorldΓÇ¥;

p++;

char c = *p;

The variable ΓÇÿcΓÇÖ will contain the character ΓÇÿeΓÇÖ after the above code is executed.┬á The byte of data at the memory address pointed to by ΓÇÿpΓÇÖ is copied to the variable ΓÇÿcΓÇÖ.┬á You can use this set of operations to copy all of the data from one string to another:

char str[12];

char *p = ΓÇ£Hello WorldΓÇ¥;

char *wp = str;

while (*pt)

{

┬á┬á┬á *wp = *pt;

┬á┬á┬á pt++;

┬á┬á┬á wp++;

}

*wp = 0;

cout << str << endl;

At the end of this code, ΓÇÿstrΓÇÖ will contain the string ΓÇ£Hello WorldΓÇ¥.┬á There are two pointers used: ΓÇÿwpΓÇÖ and ΓÇÿptΓÇÖ.┬á The first, ΓÇÿwpΓÇÖ, is a ΓÇ£writeΓÇ¥ pointer because values are assigned to it when it is dereferenced.┬á The second pointer, ΓÇÿptΓÇÖ, is a ΓÇ£readΓÇ¥ pointer because values are extracted from it when it is dereferenced.┬á The loop continues until ΓÇÿptΓÇÖ is pointing to the address of a NULL character:

while (*pt)

After that the character pointed to by ΓÇÿptΓÇÖ is copied to the address pointed to by ΓÇÿwpΓÇÖ:

*wp = *pt;

The pointers are then both incremented to move to the next memory address:

pt++;

wp++;

If the above lines of code were not present, the pointers would never be pointing to anything except what they were initialized to and the loop would be infinite (it could never logically end).┬á These increments cause the characters to be sequentially copied from the literal string into the variable string.┬á The line immediately following the loop caps off the end of the string with a NULL character:

*wp = 0;

Had this not been there, the string would have no termination character and printing the string will not work as expected (it may even cause a crash).┬á Assignment is copying data from one place to another; thus it is a copy operation. ┬áThe term copy is used with strings more frequently than the term assignment.┬á It may be the result of the standard string helper functions.

The standard function which does what I just illustrated is called ΓÇÿstrcpyΓÇÖ or ΓÇ£string copyΓÇ¥.┬á To use it you must include the file ΓÇÿstring.hΓÇÖ at the top of your program (somewhere around the same block as your ΓÇÿ#include <iostream.h>ΓÇÖ):

#include <string.h>

It takes two character point (character string) parameters:

strcpy(char *dest, const char *src);

The first parameter is a pointer to the destination character string; the destination of the string copy.┬á The second parameter is a pointer to the source character string.┬á Rather than explicitly use character pointers to copy strings, you can use the ΓÇÿstrcpyΓÇÖ function.┬á The previous code (copying ΓÇ£Hello WorldΓÇ¥ into ΓÇÿstrΓÇÖ) can be duplicated like so using ΓÇÿstrcpyΓÇÖ:

char str[12];

strcpy(str, ΓÇ£Hello WorldΓÇ¥);

There is no length verification with this function, however.┬á That is, if you declare ΓÇÿstrΓÇÖ to store only twelve (12) characters (eleven (11) visible and one (1) NULL character) and try to copy in a string longer than that, a problem will occur because memory outside ΓÇÿstrΓÇÖ will be touched.┬á A safer function than ΓÇÿstrcpyΓÇÖ is ΓÇÿstrncpyΓÇÖ:

strncpy(char *dest, const char *src, const int max);

Not only is its first two parameters identical to ΓÇÿstrcpyΓÇÖ, but there is an additional parameter specifying the maximum number of characters that should be copied to ΓÇÿdestΓÇÖ.┬á Following through with previous examples:

char str[12];

strncpy(str, ΓÇ£Hello WorldΓÇ¥, 12);

This function assures that only a certain number of characters are copied.┬á The maximum number of characters to copy includes the NULL character.┬á Thus the call in the code above ensures that only up to eleven (11) visible characters are copied.┬á The NULL character is always copied to properly terminate the ΓÇÿdestΓÇÖ string.┬á Take the following:

char str[12];

strncpy(str, ΓÇ£Goodbye WorldΓÇ¥, 12);

The string ΓÇÿstrΓÇÖ will contain ΓÇ£Goodbye WorΓÇ¥ (the first eleven (11) visible characters).┬á Note that although not all of the visible characters are copied, the NULL character is still appended to the end; properly terminating the string.

SizeOf String

Getting the size of a character array is not the same as the size of the string.┬á Usually only the characters up to the point of the NULL character are the ones you want to count.┬á The ΓÇÿsizeofΓÇÖ keyword, on the other hand, represents the size of the entire array or pointer.┬á The size of a string is more commonly referred to as its length.┬á The length of a string is the amount of visible characters from the beginning of the string to the NULL character:

char str[36] = ΓÇ£Hello WorldΓÇ¥;

int i, length = 0;

for (i = 0; str[i]; i++)

┬á┬á┬á length++;

cout << ΓÇ£The length of ΓÇÿΓÇ¥ << str << ΓÇ£ΓÇÖ is ΓÇ£ << length << endl;

The output of the above will be:

The length of ΓÇÿHello WorldΓÇÖ is 12

This code tallies the number of visible characters from the first bye of ΓÇÿstrΓÇÖ to its NULL character.┬á The condition used to keep the loop counting is simply ΓÇÿstr[i]ΓÇÖ.┬á A NULL character has a numeric value of zero (0).┬á Therefore when the index ΓÇÿiΓÇÖ represents the NULL character subscript, the condition will be false and the loop will end.

A utility function which does exactly this is ΓÇÿstrlenΓÇÖ.┬á Its one parameter is a character pointer to a string and its return value is the length of the string passed in:

int strlen(const char *str);

This function does no checking of the string you pass in.┬á If you pass a NULL string in it will cause a problem.

AuthorΓÇÖs Opinion: Unless a function explicitly mentions how it handles a NULL parameter, you should not assume that it does.

Concatenation

Appending a string to the end of a string is known as concatenation.┬á To do this manually you simply start copying characters to the string starting at the original NULL character.┬á One of the ways to find the NULL character is get a pointer to the string and add the length of the string to it:

char str[32] = ΓÇ£Hello WorldΓÇ¥;

char *p = str + strlen(str);

After the above code is executed, the pointer ΓÇÿpΓÇÖ will be pointing to the NULL characterΓÇÖs memory address.┬á You can simply start pumping characters in at that point to append to the string:

*p = ΓÇÿ!ΓÇÖ;

p++;

*p = ΓÇÿ!ΓÇÖ;

p++;

*p = 0;

cout << str << endl;

DonΓÇÖt forget to end the string with a NULL character.┬á The above will result in the output:

Hello World!!

The two exclamation points were appended by setting the subscripts at the NULL character and past to the new characters.┬á A new terminating NULL was then set at the point just past the last visible subscript.

The function ΓÇÿstrcpyΓÇÖ can be used to append data to a string.┬á This function takes the address of the first byte in a string.┬á To append data to an already-existing string you would simply pass it the address of the NULL character.┬á It would then copy the new string in starting at the NULL character of the original string:

char str[32] = ΓÇ£Hello WorldΓÇ¥;

strcpy(str + strlen(str), ΓÇ£!!ΓÇ¥);

Concatenation of a string can also be done with the more specialized ΓÇÿstrcat()ΓÇÖ or string concatenation function.┬á This takes two character pointer parameters like ΓÇÿstrcpyΓÇÖ, but it automatically finds the end of the first string and attaches the second:

void strcat(char *dest, const char *src);

Thus, to conclude our example of appending two exclamations to ΓÇ£Hello WorldΓÇ¥:

char str[32] = ΓÇ£Hello WorldΓÇ¥;

strcat(str, ΓÇ£!!ΓÇ¥);

AuthorΓÇÖs Preference: These functions are usually just fine for most circumstances.┬á However, when connecting arbitrary strings I typically use ΓÇÿstrncpyΓÇÖ because I can specify a limit on the number of characters copied.┬á This can prevent the wrong data from being written to.

Comparing Strings

Comparing two strings for equality is not as simple as using the equality operator (==).┬á It can surely be used, but it will not work as expected.┬á When you use the name of a string you are referring to the address of its first character.┬á By using the equality operator you would simply be comparing memory addresses:

char str1[]┬á = ΓÇ£Hello WorldΓÇ¥;

char str2[] = ΓÇ£Hello WorldΓÇ¥;

if (str1 == str2)

{

┬á┬á┬á // this will never happen because the condition will

┬á┬á┬á // always be false.

}

To effectively compare two strings you must do it at a mind-boggling single character at a time.┬á Because of this, comparing two strings is much more intensive than comparing two simple numbers.┬á With the simple numbers the computer itself can automatically tell you the difference, whereas with strings each character itself is a number and must be tested individually.┬á A standard function for comparing two strings is strcmp() which accepts two strings and returns an ΓÇÿintΓÇÖ:

int strcmp(char*, char*);

This function will return a negative number if the first string is less than the second, a zero (0) if they are equal, and a positive number if the first string is greater than the second.┬á A string is greater than another if the deciding character (the first different one) between the two has a larger value, and likewise with lesser comparison.

There are some things to note about this.┬á One is that it is case-specific.┬á Two strings will not be the same if their characters are of different casing.┬á Determining equality with this function can be confusing since it returns zero (0), a logical ΓÇ£falseΓÇ¥, if the strings are equal:

char str1[] = ΓÇ£Hello WorldΓÇ¥;

char str2[] = ΓÇ£Hello WorldΓÇ¥;

char str3[] = ΓÇ£Goodbye WorldΓÇ¥;

if (0 == strcmp(str1, str2))

{

┬á┬á┬á cout << str1 << ΓÇ£ and ΓÇ£ << str2 << ΓÇ£ are equalΓÇ¥ << endl;

}

if (0 != strcmp(str1, str3))

{

┬á┬á┬á cout << str1 << ΓÇ£ is not equal with ΓÇ£ << str3 << endl;

}

The output of the above code would be:

Hello World and Hello World are equal

Hello World is not equal with Goodbye World

The function strncmp() is identical to strcmp() except that it allows you to specify the maximum number of characters from each string to compare, much like strncpy() is to strcpy().

Another issue with string comparison is deciding whether or not a string contains nothing.┬á There are three ways a string can be in this state.┬á The first is a NULL string or a string whose address is zero.┬á This applies to pointer strings only because a character array will never have a NULL address[7].┬á The second way is an empty string which is one that begins with a NULL character.┬á Because its first character is NULL, it has a length of zero (0) visible characters.┬á The third and last way is a blank string which is one that contains only white space characters such as spaces and tabs.┬á It is inarguably a string of nothing, but it still has character content.

The first two ways are easy to determine, but the last requires a character by character assessment.┬á Determining a NULL string is as simple as comparing it to zero or NULL itself.┬á Deciding if a string is empty can be done by checking if the first character is NULL or if the string has a length of zero (possibly using strlen()).┬á But checking a string for blankness, also a nothing-state, is left up to you as there is no standard function.┬á How would you write a function to determine if a string contained nothing?

Strings and Numbers

A popular question is how to convert a string to a number and vice versa.┬á Obviously, writing this yourself would be a gruesome task because you have to compare each character in the string individually, decide which digit and place it represents in the number, and compound it with previous results to eventually get a whole number.┬á I know it takes a bit of work because I wrote some string to number conversion functions when I was first beginning, before I stumbled upon atoi() and eventually strtol().

First, please remind yourself that a string containing a number is completely different from an actual number value.┬á Consider:

char str[] = ΓÇ£123ΓÇ¥;

int x = 123;

The numeric value of ΓÇÿxΓÇÖ above is one hundred and twenty-three (123).┬á ItΓÇÖs a single numeric value.┬á The numeric value of ΓÇÿstrΓÇÖ is made up of four values: one for each visible character (ΓÇÿ1ΓÇÖ == 49, ΓÇÿ2ΓÇÖ == 50, ΓÇÿ3ΓÇÖ == 51 in ASCII) and one for the NULL character which has a numeric value of zero (0).┬á This is why conversion is needed.┬á The string must have all of its characters interpreted as part of a whole, single numeric value.┬á You cannot simply assign ΓÇÿstrΓÇÖ to ΓÇÿxΓÇÖ:

x = str;

Even if you forced that to work through clever casting (ΓÇÿx = (int)strΓÇÖ) you would not get the desired results.┬á What would ΓÇÿxΓÇÖ contain after the assignment operation?┬á It would contain the address of ΓÇÿstrΓÇÖ rather than the number ΓÇÿ123ΓÇÖ.┬á If you have better luck than a leprechaun the address might actually be the same as that, but instead of relying on chance you can convert the string into a number.

Possibly the simplest string to number conversion function is ΓÇÿatoi()ΓÇÖ.┬á To use it and the other functions I will explain here, you must include <stdlib.h>.┬á The name of <stdlib.h> appears to be an abbreviation for ΓÇ£Standard LibraryΓÇ¥.┬á The things provided by this file are very diverse and can have little in common.┬á Anyway, this function atoi() returns an ΓÇÿintΓÇÖ and accepts a string parameter:

int atoi(char *);

Therefore to use it you pass it a string and assign the return value to an ΓÇÿintΓÇÖ, or output it, or whatever.┬á Getting back to our previous example:

char str[] = ΓÇ£123ΓÇ¥;

int x = atoi(str);

The ΓÇÿatoi()ΓÇÖ function interprets every digit of the string and computes the number it represents.┬á After the above code is executed, ΓÇÿxΓÇÖ would contain the numeric value one hundred and twenty-three (123). ┬áVoila!┬á The sibling function ΓÇÿatol()ΓÇÖ works in the same way, but it returns a ΓÇÿlongΓÇÖ value rather than an ΓÇÿintΓÇÖ; which means it can interpret and return larger numbers on some systems.┬á On most modern systems an ΓÇÿintΓÇÖ is as big as or larger than a ΓÇÿlongΓÇÖ.

Converting strings to floating point numbers is even more complex than to whole (integer) numbers, but fortunately the helper functions are much the same.┬á The function ΓÇÿatof()ΓÇÖ returns a ΓÇÿdoubleΓÇÖ which contains the floating point numerical representation of the string passed in.┬á Both this function and its integer mates will return zero if the string cannot be properly interpreted as a number:

char str[] = ΓÇ£helloΓÇ¥;

int x = atoi(str);

The above code will place the value ΓÇÿ0ΓÇÖ into ΓÇÿxΓÇÖ because ΓÇ£helloΓÇ¥ cannot be converted into an integer.┬á A limitation of ΓÇÿatoi()ΓÇÖ and ΓÇÿatol()ΓÇÖ is that they donΓÇÖt (apparently ΓÇô not sure what the standard says on this) understand the notations of different numbering systems.┬á Thus if you were to place a hexadecimal number in a string and converted it to a number, you would not get the desired result:

char str[] = ΓÇ£0x123ΓÇ¥;

int num = atoi(str);

The above code would yield various results depending on the functions implemented by your C++ software.┬á I would think the most common result would be placing ΓÇÿ0ΓÇÖ into ΓÇÿxΓÇÖ; because the parser would see ΓÇÿ0ΓÇÖ then see ΓÇÿxΓÇÖ and determine that ΓÇÿ0ΓÇÖ was the only number in the string.┬á A more advanced parser might ignore the ΓÇÿxΓÇÖ and you would end up with one hundred and twenty-three (123) in ΓÇÿnumΓÇÖ.┬á But like I said, the results can vary.┬á You should be fairly certain that the string youΓÇÖre converting is actually a number.┬á Unfortunately there are no standard functions to tell if a string can be properly converted to a number, your best bet is just to try.

Advanced functions like ΓÇÿstrtol()ΓÇÖ and ΓÇÿstrtod()ΓÇÖ exist which allow greater control over the conversion.┬á These, however, contain parameter types that are beyond the pre-requisite of this chapter and I donΓÇÖt want to give you a headache just yet. J ┬áOr a second one that is!

Reversing all of these concepts is converting a numeric value into a string of characters.┬á Can you believe that no standard functions exist for this purpose?┬á This is not entirely true, but for the most part there are no functions dedicated to simple conversion from number to string.┬á Note that when youΓÇÖre using ΓÇ£coutΓÇ¥ to put data onto the screen, youΓÇÖre converting it to character and string data.┬á The console ΓÇ£screenΓÇ¥ is actually a giant grid of characters.┬á Each row is a string and each column is a character.┬á Thus, we can use the same functionality to convert numbers to strings, but instead of printing to the screen we can print to a string.

Fortunately I wonΓÇÖt drag your eyeballs through the mud of doing this with ΓÇÿcoutΓÇÖ-style functionality at this point.┬á The simplest way is through the function ΓÇÿsprintf()ΓÇÖ which C programmers will recognize as the work-horse behind ΓÇÿprintf()ΓÇÖ (CΓÇÖs version of ΓÇÿcoutΓÇÖ).┬á First off make sure you include <stdio.h> so that you can use it.┬á Now, the syntax of this function is complex so mine below is intentionally simplified:

sprintf(char *string, char *format, int number);

The ΓÇÿformatΓÇÖ parameter is a string that describes how to convert the number into a string.┬á That is, it specifies leading zeroes, trailing zeroes, numbering system, and other fun stuff.┬á The type of the ΓÇÿnumberΓÇÖ parameter may vary depending on your format string.┬á For now, just stick to ΓÇÿintΓÇÖ variables.┬á For ΓÇÿformatΓÇÖ use ΓÇ£%dΓÇ¥ when converting ΓÇÿintΓÇÖ variables (or any integer literal).┬á Make sure the string you pass in has enough space to hold the converted number.┬á The most digits a 32-digit integer can use is nine (10) so I usually make sure my string is declared for at least twelve characters (consider a negative sign might be present as well as the NULL character):

char str[12];

int x = 123;

sprintf(str, ΓÇ£%dΓÇ¥, x);

cout << ΓÇ£str is ΓÇ£ << str << endl;

The above code converts ΓÇÿxΓÇÖ into a string and copies that into ΓÇÿstrΓÇÖ.┬á The output would be:

str is 123

Blarg.

Compound Strings

Because strings are actually sequences of multiple characters, a single string might contain many letters, words, paragraphs, and even pages of text.┬á It is important to keep reminding yourself how string variables work.┬á A string variable represents the address of its first character.┬á You can wield this to your advantage with pointer arithmetic by using pieces of strings as if they were whole strings themselves.

How would you generate a string that contained two parts: a string prefix and a numerical suffix; that is, the tail end of the string was added by converting a number to a string.┬á One way would be:

int x;

cout << ΓÇ£Enter favorite number: ΓÇ£;

cin >> x;

char str[64] = ΓÇ£UserΓÇÖs favorite number is ΓÇ£;

char fav[12];

sprintf(fav, ΓÇ£%dΓÇ¥, x);

strcat(str, fav);

Or you could have the number converted into a string and copied directly into the whole result using pointer arithmetic and avoid using a separate string altogether:

sprintf(str + strlen(str), ΓÇ£%dΓÇ¥, x);

Can you guess how the above code works?┬á The string ΓÇÿstrΓÇÖ contains a valid string of characters terminated by a NULL character.┬á The variable ΓÇÿstrΓÇÖ itself represents the address of that first character.┬á In the expression we add the result of ΓÇÿstrlen()ΓÇÖ, which returns the number of visible characters, to ΓÇÿstrΓÇÖ which is the address of its first character.┬á The result of this expression is the address of the NULL character in ΓÇÿstrΓÇÖ; that is, the character just past the last visible character.┬á So, since ΓÇÿsprintf()ΓÇÖ copies its string result, one character at a time, into that address the number is effectively appended to the end.

Now, what if we wanted to divine the number at the end of this string?┬á If we call ΓÇÿatoi()ΓÇÖ with the entire string it will return zero (0) because ΓÇ£UserΓÇÖsΓÇ¥ is not a number. J┬á Instead we add twenty-six (26), the length of the prefix, to ΓÇÿstrΓÇÖ as we pass it to ΓÇÿatoi()ΓÇÖ:

x = atoi(str + 26);

Thus the function receives the address of the first character of the number converted to a string.┬á It has no way of knowing that the address it receives is not a whole string and no reason to believe otherwise.┬á The function does its job and converts the string it receives (which would only be the number, not the whole string) and returns.┬á As a final touch, letΓÇÖs restore ΓÇÿstrΓÇÖ to its original string value; in a larger program this might be so that a new number can be attached to the end.┬á One way would be to completely re-copy in the original value:

strcpy(str, ΓÇ£UserΓÇÖs favorite number is ΓÇ£);

However, we already know that the last character of this prefix is at index twenty-six (26), so why not just insert a NULL character directly there?┬á This is how that would be done:

str[26] = 0;

This code would place a numeric zero (0), or NULL character, at the index twenty-six (26).┬á A string is still a character array so this is perfectly valid.┬á Notice how a string takes advantage of both array and pointer syntax.┬á By placing the NULL character here, the remaining characters have been affectively ΓÇ£chopped offΓÇ¥.┬á Rather, they still exist but any functions utilizing the string will stop at the terminating NULL character and never look past.

Compound strings would be those that are made of multiple parts and possibly separate pieces of data.┬á These pieces of data can be extracted or modified by manipulating the string appropriately.┬á Strings are sometimes used to contain large blobs of data for this very reason.┬á Rather than deal with a huge structure, you deal with a block of text that must be parsed.

String Manipulation

Remember that to modify one ΓÇ£wholeΓÇ¥ string you must do it one character at a time.┬á A string is a sequence of characters and therefore any modification done to the ΓÇ£stringΓÇ¥ is actually done to each element in the sequence.┬á Previously I explained how to change the casing of a single character using the ΓÇÿtoupper()ΓÇÖ and ΓÇÿtolower()ΓÇÖ functions.┬á With this knowledge and the knowledge of strings you should be able to write a similar function that changes the case of the entire string.

String manipulation is a big part of programming because much of how we interact is through language.┬á Computers enjoy their numbers, perhaps a bit too much, but most humans canΓÇÖt communicate properly on a diet of numbers alone.┬á Because of the complexity of utilizing strings of text in computers, there are many standard utilities that come with all modern C++ software to assist you.┬á Some of these you have already seen with the functions I introduced previously.

Other standard helpers to manipulate strings will come along in later in this book.┬á One of the most common factors among high-level languages in the present day is the idea to make strings like other primitive variables.┬á That is, imagine being able to actually do the following in C++:

string msg = ΓÇ£Hello ΓÇ£;

msg += ΓÇ£WorldΓÇ¥;

cout << msg;

The above code lacks the numbers and calculations required of us thus far.┬á In fact, to do the same thing with what IΓÇÖve written it might look like this:

char msg[64] = ΓÇ£Hello ΓÇ£;

strcat(msg, ΓÇ£WorldΓÇ¥);

cout << msg;

Although strings might seem burdensome in C++, bear with me.┬á I am starting you out at the ground floor and we have not yet reached the escalator.┬á What you have seen is how strings really work.┬á But there are many facilities to aid you in string manipulation and many do abstract them into single data types than lists of characters.┬á You may even find that the previous code above is more than just a pipe dream.

Unicode Strings

Understanding different code pages isnΓÇÖt a problem as long as everyone uses ASCII, but everyone doesnΓÇÖt.┬á Languages like Japanese and Russian cannot be utilized with the ASCII character set, they require different (or multiple!) code pages.┬á But again, if youΓÇÖre working on a program only for English (or like languages like Spanish and French which can be displayed using ASCII) this isnΓÇÖt a problem.

Globalization is the process of making oneΓÇÖs program usable by a global audience; i.e. those of different languages and customs.┬á There are more things than just language which make each group of people unique, but thatΓÇÖs the only thing IΓÇÖll cover here.┬á Localization is the process of making oneΓÇÖs program usable by a particular audience.┬á If you continue to write programs for Americans (because our English is slightly different from our British brothers) then you are localizing your program to America.┬á Some companies, like Microsoft, build several versions of their programs: a globalized one that is generically sensitive to language and other things and localized ones which are sensitive to customs in specific regions.┬á The localized versions still have the ability to display any other language; they are simply more tailored for a specific region.

Making a program display multiple languages through the normal character systems is fairly complex, because you must switch code pages in order to display different characters for different languages.┬á This switching is beyond the scope of this document, however.┬á I want to discuss the alternative which is Unicode.┬á Unicode is a single gigantic character set that contains every necessary character for every language on the face of the earth ΓÇª and then some (like smilies and such J).┬á Achieving this with normal character literals and variables is quite impossible as they are normally 8 bits on most systems which is only large enough to store one (1) of two hundred fifty-six (256) different values.┬á Thus, Unicode characters are stored in ΓÇ£wideΓÇ¥ character variables and literals.

The equivalent of a wide character is a short integer which requires two (2) bytes and has sixty-five thousand five hundred thirty-six (65,536) possible values.┬á So a variable to store these Unicode characters has been around for a long time, but what about literals?┬á At the time of writing this there are still compilers which do not support ΓÇ£wideΓÇ¥ character literals.┬á YouΓÇÖll know if yours does if you can precede a literal character with an ΓÇ£LΓÇ¥:

LΓÇÖcΓÇÖ

This causes the character to be a Unicode character and require two bytes of storage.┬á As luck (and engineers) would have it, the first two hundred fifty-six (256) indices in Unicode are identical to that of ASCII.┬á Thus an ASCII character becomes Unicode simply by putting it into a two byte storage unit rather than a single byte.

Literal wide strings are made in the same way as wide characters: by preceding the string with an ΓÇ£LΓÇ¥:

LΓÇ¥Hello WorldΓÇ¥

Each character in the string now requires two bytes, so the total amount of storage required for the above string (ΓÇ£Hello WorldΓÇ¥) is twenty-four (24) because of the eleven (11) visible characters and one (1) NULL character.┬á ThatΓÇÖs right, the NULL character like any other in Unicode requires two (2) bytes.

Although useful for programs destined for multiple cultures, Unicode may not have a place in yours, especially at an early stage in programming.┬á It appears to be a big thing though, so as you move on in your programming career you may see it more and more.

[1] As long as the size of a byte is 8 bits ΓÇª that too can differ from system to system.

[2] I have to assume you know the English alphabet, not only because you have to read my bad English, but there is no identifying word for each letter!┬á Aye?

[3] There are more intriguing aspects to this function like a default parameter and overloaded versions; IΓÇÖm trimming out these things for simplicityΓÇÖs sake.

[4] This phenomenon occurs because of the way certain operating systems store new line control data in plain text.┬á Unices and Mac use only a single line-feed (character 10 or 0xA) character, but Windows, DOS, OS/2 and others use a carriage return followed by a line feed (characters 13 and 10 or 0x0D0A).┬á Bad implementations on these systems get data up to the end of the first carriage return or line feed rather than a combination of the two.┬á What results is an input buffer that is endlessly blank.┬á The only solution is to explicitly clear up the entire buffer with that ugly code.

[5] The functionality used for this code is straight from the C++ Standard and should work on all modern compilers.

[6] The arithmetic operations on pointers simply change the numerical address stored in the pointer; itΓÇÖs when you try to dereference the pointer that you try accessing actual memory.┬á Accessing some memory (like address zero) causes an error to happen.

[7] Unless it is a dynamic array which has not yet been covered.